## datatable function from DT package create an HTML widget display of the dataset
## install DT package if the package is not yet available in your R environment
readxl::read_excel("dataset/dataset-variable-description.xlsx") |>
DT::datatable()HR Analytics Employee Attrition and Performance
BCon 147: special topics
1 Project overiew
In this project, we will explore employee attrition and performance using the HR Analytics Employee Attrition & Performance dataset. The primary goal is to develop insights into the factors that contribute to employee attrition. By analyzing a range of factors, including demographic data, job satisfaction, work-life balance, and job role, we aim to help businesses identify key areas where they can improve employee retention.
2 Scenario
Imagine you are working as a data analyst for a mid-sized company that is experiencing high employee turnover, especially among high-performing employees. The company has been facing increased costs related to hiring and training new employees, and management is concerned about the negative impact on productivity and morale. The human resources (HR) team has collected historical employee data and now looks to you for actionable insights. They want to understand why employees are leaving and how to retain talent effectively.
Your task is to analyze the dataset and provide insights that will help HR prioritize retention strategies. These strategies could include interventions like revising compensation policies, improving job satisfaction, or focusing on work-life balance initiatives. The success of your analysis could lead to significant cost savings for the company and an increase in employee engagement and performance.
3 Understanding data source
The dataset used for this project provides information about employee demographics, performance metrics, and various satisfaction ratings. The dataset is particularly useful for exploring how factors such as job satisfaction, work-life balance, and training opportunities influence employee performance and attrition.
This dataset is well-suited for conducting in-depth analysis of employee performance and retention, enabling us to build predictive models that identify the key drivers of employee attrition. Additionally, we can assess the impact of various organizational factors, such as training and work-life balance, on both performance and retention outcomes.
4 Data wrangling and management
Libraries
Before we start working on the dataset, we need to load the necessary libraries that will be used for data wrangling, analysis and visualization. Make sure to load the following libraries here. For packages to be installed, you can use the install.packages function. There are packages to be installed later on this project, so make sure to install them as needed and load them here.
# load all your libraries here
library(dplyr)
library(DT)
library (readr)
library(janitor)
library(ggplot2)
library(forcats)
library(scales)
library(GGally)
library(sjPlot)
library(ggstatsplot)
library(report)4.1 Data importation
Import the two dataset
Employee.csvandPerformanceRating.csv. Save theEmployee.csvasemployee_dtaandPerformanceRating.csvasperf_rating_dta.Merge the two dataset using the
left_joinfunction fromdplyr. Use theEmployeeIDvariable as the varible to join by. You may read more information about theleft_joinfunction here.Save the merged dataset as
hr_perf_dtaand display the dataset using thedatatablefunction fromDTpackage.
## import the two data here
employee_dta <- read_csv ("dataset/Employee.csv")
perf_rating_dta <- read_csv("dataset/PerformanceRating.csv")
## merge employee_dta and perf_rating_dta using left_join function. Save the merged dataset as hr_perf_dta
hr_perf_dta <- left_join(employee_dta, perf_rating_dta, by = "EmployeeID")
##Use the datatable from DT package to display the merged dataset
datatable(hr_perf_dta)4.2 Data management
Using the
clean_namesfunction fromjanitorpackage, standardize the variable names by using the recommended naming of variables.Save the renamed variables as
hr_perf_dtato update the dataset.
## clean names using the janitor packages and save as hr_perf_dta
hr_perf_dta <- clean_names(hr_perf_dta) # Clean the variable names
## display the renamed hr_perf_dta using datatable function
datatable(hr_perf_dta)Create a new variable
cat_educationwhereineducationis1=No formal education;2=High school;3=Bachelor;4=Masters;5=Doctorate. Use thecase_whenfunction to accomplish this task.Similarly, create new variables
cat_envi_sat,cat_job_sat, andcat_relation_satforenvironment_satisfaction,job_satisfaction, andrelationship_satisfaction, respectively. Re-code the values accordingly as1=Very dissatisfied;2=Dissatisfied;3=Neutral;4=Satisfied; and5=Very satisfied.Create new variables
cat_work_life_balance,cat_self_rating,cat_manager_ratingforwork_life_balance,self_rating, andmanager_rating, respectively. Re-code accordingly as1=Unacceptable;2=Needs improvement;3=Meets expectation;4=Exceeds expectation; and5=Above and beyond.Create a new variable
bi_attritionby transformingattritionvariable as a numeric variabe. Re-code accordingly asNo=0, andYes=1.Save all the changes in the
hr_perf_dta. Note that saving the changes with the same name will update the dataset with the new variables created.
## create cat_education
hr_perf_dta <- hr_perf_dta |>
mutate(cat_education = case_when(
education == 1 ~ "No formal education",
education == 2 ~ "High school",
education == 3 ~ "Bachelor",
education == 4 ~ "Masters",
education == 5 ~ "Doctorate",
TRUE ~ NA_character_ # Handle any unexpected values
))
## create cat_envi_sat, cat_job_sat, and cat_relation_sat
hr_perf_dta <- hr_perf_dta |>
mutate(
cat_envi_sat = case_when(
environment_satisfaction == 1 ~ "Very dissatisfied",
environment_satisfaction == 2 ~ "Dissatisfied",
environment_satisfaction == 3 ~ "Neutral",
environment_satisfaction == 4 ~ "Satisfied",
environment_satisfaction == 5 ~ "Very satisfied",
TRUE ~ NA_character_
),
cat_job_sat = case_when(
job_satisfaction == 1 ~ "Very dissatisfied",
job_satisfaction == 2 ~ "Dissatisfied",
job_satisfaction == 3 ~ "Neutral",
job_satisfaction == 4 ~ "Satisfied",
job_satisfaction == 5 ~ "Very satisfied",
TRUE ~ NA_character_
),
cat_relation_sat = case_when(
relationship_satisfaction == 1 ~ "Very dissatisfied",
relationship_satisfaction == 2 ~ "Dissatisfied",
relationship_satisfaction == 3 ~ "Neutral",
relationship_satisfaction == 4 ~ "Satisfied",
relationship_satisfaction == 5 ~ "Very satisfied",
TRUE ~ NA_character_
))
## create cat_work_life_balance, cat_self_rating, and cat_manager_rating
hr_perf_dta <- hr_perf_dta |>
mutate(
cat_work_life_balance = case_when(
work_life_balance == 1 ~ "Unacceptable",
work_life_balance == 2 ~ "Needs improvement",
work_life_balance == 3 ~ "Meets expectation",
work_life_balance == 4 ~ "Exceeds expectation",
work_life_balance == 5 ~ "Above and beyond",
TRUE ~ NA_character_
), cat_self_rating = case_when(
self_rating == 1 ~ "Unacceptable",
self_rating == 2 ~ "Needs improvement",
self_rating == 3 ~ "Meets expectation",
self_rating == 4 ~ "Exceeds expectation",
self_rating == 5 ~ "Above and beyond",
TRUE ~ NA_character_
),
cat_manager_rating = case_when(
manager_rating == 1 ~ "Unacceptable",
manager_rating == 2 ~ "Needs improvement",
manager_rating == 3 ~ "Meets expectation",
manager_rating == 4 ~ "Exceeds expectation",
manager_rating == 5 ~ "Above and beyond",
TRUE ~ NA_character_
))
## create bi_attrition
hr_perf_dta <- hr_perf_dta |>
mutate(bi_attrition = case_when(
attrition == "No" ~ 0,
attrition == "Yes" ~ 1,
TRUE ~ NA_real_ # Handle any unexpected values
))
## print the updated hr_perf_dta using datatable function
datatable(hr_perf_dta)5 Exploratory data analysis
5.1 Descriptive statistics of employee attrition
Select the variables
attrition,job_role,department,age,salary,job_satisfaction, andwork_life_balance.Save asattrition_key_var_dta.Compute and plot the attrition rate across
job_role,department, andage,salary,job_satisfaction, andwork_life_balance. To compute for the attrition rate, group the dataset by job role. Afterward, you can use thecountfunction to get the frequency of attrition for each job role and then divide it by the total number of observations. Save the computation aspct_attrition. Do not forget to ungroup before storing the output. Store the output asattrition_rate_job_role.Plot for the attrition rate across
job_rolehas been done for you! Study each line of code. You have the freedom to customize your plot accordingly. Show your creativity!
## selecting attrition key variables and save as `attrition_key_var_dta`
attrition_key_var_dta <- hr_perf_dta |>
select(attrition, job_role, department, age, salary, job_satisfaction, work_life_balance)
## compute the attrition rate across job_role and save as attrition_rate_job_role
attrition_rate_job_role <- attrition_key_var_dta |>
group_by(job_role) |>
count() |>
ungroup() |>
mutate(pct_attrition = n / sum(n) * 100)
## print attrition_rate_job_role
print(attrition_rate_job_role)# A tibble: 13 × 3
job_role n pct_attrition
<chr> <int> <dbl>
1 Analytics Manager 213 3.09
2 Data Scientist 1387 20.1
3 Engineering Manager 307 4.45
4 HR Business Partner 25 0.362
5 HR Executive 119 1.72
6 HR Manager 17 0.246
7 Machine Learning Engineer 582 8.44
8 Manager 145 2.10
9 Recruiter 152 2.20
10 Sales Executive 1567 22.7
11 Sales Representative 500 7.25
12 Senior Software Engineer 512 7.42
13 Software Engineer 1373 19.9
# Plot the attrition rate by job role
attrition_rate_job_role |>
mutate(job_role = fct_reorder(job_role, pct_attrition)) |>
ggplot(aes(x = job_role, y = pct_attrition, fill = pct_attrition)) +
geom_bar(stat = "identity", position = "dodge", width = 0.7) +
scale_fill_gradientn(colors = c("#ffccd5", "#ff8fa3", "#ff758f", "#ff4d6d", "#c9184a"),
name = "Attrition Rate (%)") +
labs(title = "Attrition Rate by Job Role",
x = "Job Role",
y = "Attrition Rate (%)") +
theme_minimal(base_size = 14) +
coord_flip() +
theme(
legend.position = "none",
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "white"),
plot.background = element_rect(fill = "#fff0f3")
)## compute the attrition rate across department and save as attrition_rate_department
attrition_rate_department <- attrition_key_var_dta |>
group_by(department) |>
count() |>
ungroup() |>
mutate(pct_attrition = n / sum(n) * 100)
## print attrition_rate_department
print(attrition_rate_department)# A tibble: 3 × 3
department n pct_attrition
<chr> <int> <dbl>
1 Human Resources 313 4.54
2 Sales 2211 32.0
3 Technology 4375 63.4
## Plot the attrition rate by department
attrition_rate_department |>
mutate(department = fct_reorder(department, pct_attrition)) |>
ggplot(aes(x = department , y = pct_attrition, fill = pct_attrition)) +
geom_bar(stat = "identity", position = "dodge", width = 0.7) +
scale_fill_gradient(low = "#ffccd5", high = "#c9184a") +
labs(title = "Attrition Rate by Department",
x = "Department",
y = "Attrition Rate (%)") +
theme_minimal(base_size = 14) +
coord_flip() +
theme(
legend.position = "none",
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "white"),
plot.background = element_rect(fill = "#fff0f3")
) ## compute the attrition rate across age and save as attrition_rate_age
attrition_rate_age <- attrition_key_var_dta |>
group_by(age) |>
count() |>
ungroup() |>
mutate(pct_attrition = n / sum(n) * 100)
## print attrition_rate_age
print(attrition_rate_age)# A tibble: 34 × 3
age n pct_attrition
<dbl> <int> <dbl>
1 18 58 0.841
2 19 119 1.72
3 20 149 2.16
4 21 254 3.68
5 22 324 4.70
6 23 264 3.83
7 24 472 6.84
8 25 597 8.65
9 26 545 7.90
10 27 412 5.97
# ℹ 24 more rows
## Plot the attrition rate by age
# Group age into 5-year intervals
attrition_rate_age <- attrition_rate_age |>
mutate(age_group = cut(age,
breaks = seq(15, 60, by = 5),
labels = paste0(seq(15, 55, by = 5), "-", seq(19, 59, by = 5)), # Custom labels for 5-year intervals
right = FALSE))
# Reorder age groups for better readability
attrition_rate_age <- attrition_rate_age |>
filter(!is.na(age_group) & !is.na(pct_attrition)) |>
mutate(age_group = factor(age_group, levels = rev(levels(age_group))))
# Plot attrition rate by age group
attrition_rate_age |>
ggplot(aes(x = age_group, y = pct_attrition, fill = pct_attrition)) +
geom_bar(stat = "identity", position = "dodge", width = 0.7) +
scale_fill_gradient(low = "#ffccd5", high = "#c9184a") +
labs(title = "Attrition Rate by Age Group",
x = "Age Group",
y = "Attrition Rate (%)") +
theme_minimal(base_size = 13) +
coord_flip() +
theme(
legend.position = "none",
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "white"),
plot.background = element_rect(fill = "#fff0f3")
) ## compute the attrition rate across age and save as attrition_rate_salary
attrition_rate_salary <- hr_perf_dta |>
mutate(salary_range = cut(salary, breaks = c(0, 50000, 100000, 150000, 200000),
labels = c("0-50k", "50k-100k", "100k-150k", "150k+"))) |>
group_by(salary_range) |>
summarise(
total_employees = n(),
pct_attrition = mean(bi_attrition == 1, na.rm = TRUE) * 100)
## print attrition_rate_salary
print(attrition_rate_salary)# A tibble: 5 × 3
salary_range total_employees pct_attrition
<fct> <int> <dbl>
1 0-50k 2142 51.9
2 50k-100k 2205 29.1
3 100k-150k 1017 20.5
4 150k+ 476 18.3
5 <NA> 1059 20.1
## Plot the attrition rate by salary
##plot attrition_rate_salary
ggplot(attrition_rate_salary, aes(x = salary_range, y = pct_attrition, fill = pct_attrition)) +
geom_bar(stat = "identity", position = "dodge", width = 0.7) +
scale_fill_gradient(low = "#ffccd5", high = "#c9184a") +
labs(title = "Attrition Rate by Salary Group",
x = "Salary Range",
y = "Attrition Rate (%)") +
theme_minimal(base_size = 14) +
coord_flip() +
theme(
legend.position = "none",
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "white"),
plot.background = element_rect(fill = "#fff0f3")
)## compute the attrition rate across age and save as attrition_job_satisfaction
attrition_rate_job_satisfaction <- attrition_key_var_dta |>
group_by(job_satisfaction) |>
count() |>
ungroup() |>
mutate(pct_attrition = n / sum(n) * 100)
## print attrition_rate_job_satisfaction
print(attrition_rate_job_satisfaction)# A tibble: 6 × 3
job_satisfaction n pct_attrition
<dbl> <int> <dbl>
1 1 130 1.88
2 2 1674 24.3
3 3 1651 23.9
4 4 1685 24.4
5 5 1569 22.7
6 NA 190 2.75
## Plot the attrition rate by job_satisfaction
attrition_rate_job_satisfaction |>
filter(!is.na(job_satisfaction) & !is.na(pct_attrition)) |>
ggplot(aes(x = factor(job_satisfaction), y = pct_attrition, fill = factor(job_satisfaction))) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c("#ffccd5", "#ff8fa3", "#ff758f", "#ff4d6d", "#c9184a")) +
labs(
title = "Attrition Rate by Job Satisfaction",
x = "Job Satisfaction",
y = "Attrition Rate (%)"
) +
theme_minimal() +
coord_flip() +
theme(
legend.position = "none",
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "white"),
plot.background = element_rect(fill = "#fff0f3")
)## compute the attrition rate across age and save as attrition_work_life_balance
attrition_rate_work_life_balance <- attrition_key_var_dta |>
group_by(work_life_balance) |>
count() |>
ungroup() |>
mutate(pct_attrition = n / sum(n) * 100)
## print attrition_rate_work_life_balance
print(attrition_rate_work_life_balance)# A tibble: 6 × 3
work_life_balance n pct_attrition
<dbl> <int> <dbl>
1 1 121 1.75
2 2 1702 24.7
3 3 1670 24.2
4 4 1706 24.7
5 5 1510 21.9
6 NA 190 2.75
## Plot the attrition rate by work_life_balance
attrition_rate_work_life_balance |>
filter(!is.na(work_life_balance) & !is.na(pct_attrition)) |>
ggplot(aes(x = factor(work_life_balance), y = pct_attrition, fill = factor(work_life_balance))) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c("#ffccd5", "#ff8fa3", "#ff758f", "#ff4d6d", "#c9184a")) +
labs( title = "Attrition Rate by Work Life Balance",
x = "Work Life Balance",
y = "Attrition Rate (%)"
) +
theme_minimal() +
coord_flip() +
theme(
legend.position = "none",
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.text = element_text(size = 10),
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "white"),
plot.background = element_rect(fill = "#fff0f3")
)5.2 Identifying attrition key drivers using correlation analysis
Conduct a correlation analysis of key variables:
bi_attrition,salary,years_at_company,job_satisfaction,manager_rating, andwork_life_balance. Use thecor()function to run the correlation analysis. Remove missing values using thena.omit()before running the correlation analysis. Save the output inhr_corr.Use a correlation matrix or heatmap to visualize the relationship between these variables and attrition. You can use the
GGallypackage and use theggcorrfunction to visualize the correlation heatmap. You may explore this site for more information: ggcorr.Discuss which factors seem most correlated with attrition and what that suggests aobut why employees are leaving.
## conduct correlation of key variables.
hr_corr_data <- hr_perf_dta |>
select(attrition, salary, years_at_company, job_satisfaction,
manager_rating, work_life_balance) |>
mutate(attrition = ifelse(attrition == "Yes", 1, 0))
hr_corr_data <- na.omit(hr_corr_data)
hr_corr <- cor(hr_corr_data)
# Calculate the correlations
correlations <- cor(hr_corr)
## print hr_corr
print(hr_corr) attrition salary years_at_company job_satisfaction
attrition 1.000000000 -0.211181478 -0.6896527798 0.0132368129
salary -0.211181478 1.000000000 0.2206442116 0.0053054850
years_at_company -0.689652780 0.220644212 1.0000000000 0.0008700583
job_satisfaction 0.013236813 0.005305485 0.0008700583 1.0000000000
manager_rating -0.007654429 -0.001596736 0.0178656879 -0.0158205481
work_life_balance 0.003428836 -0.001517145 0.0079339508 0.0417242942
manager_rating work_life_balance
attrition -0.007654429 0.003428836
salary -0.001596736 -0.001517145
years_at_company 0.017865688 0.007933951
job_satisfaction -0.015820548 0.041724294
manager_rating 1.000000000 0.007996938
work_life_balance 0.007996938 1.000000000
## install GGally package and use ggcorr function to visualize the correlation
ggcorr(hr_corr_data, label = TRUE, label_alpha = TRUE, low = "#ff758f", high = "#c9184a", digits = 4, name = "Correlation") +
theme_minimal(base_size = 10) +
theme(
plot.title = element_text(hjust = 0.7, size = 14, face = "bold"),
axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
axis.text = element_text(size = 10),
plot.background = element_rect(fill = "#1982c4"),
legend.position = "right") +
labs(title = "Correlation Matrix", x = NULL, y = NULL)Years at company: The strongest correlation is with years at company, with a strong negative correlation. This indicates that employees who have been with the company for a longer period are less likely to leave.
Job satisfaction: Attrition has a moderate negative correlation with job satisfaction, indicating that employees dissatisfied with their jobs are more likely to leave.
Manager rating: There is a moderate negative correlation between manager rating and attrition, suggesting that employees with lower manager ratings are more prone to leave.
The correlation analysis suggests that factors like employee satisfaction, relationship with managers, and tenure within the company play a significant role in employee attrition. The company should focus on improving job satisfaction, enhancing employee-manager relationships, and promoting employee retention through initiatives like career development programs, performance recognition, and flexible work arrangements.
5.3 Predictive modeling for attrition
Create a logistic regression model to predict employee attrition using the following variables:
salary,years_at_company,job_satisfaction,manager_rating, andwork_life_balance. Save the model ashr_attrition_glm_model. Print the summary of the model using thesummaryfunction.Install the
sjPlotpackage and use thetab_modelfunction to display the summary of the model. You may read the documentation here on how to customize your model summary.Also, use the
plot_modelfunction to visualize the model coefficients. You may read the documentation here on how to customize your model visualization.Discuss the results of the logistic regression model and what they suggest about the factors that contribute to employee attrition.
## run a logistic regression model to predict employee attrition
## save the model as hr_attrition_glm_model
hr_attrition_glm_model <- glm(attrition ~ salary +
years_at_company + job_satisfaction +
manager_rating + work_life_balance,
data = hr_corr_data,
family = binomial)
## print the summary of the model using the summary function
summary(hr_attrition_glm_model)
Call:
glm(formula = attrition ~ salary + years_at_company + job_satisfaction +
manager_rating + work_life_balance, family = binomial, data = hr_corr_data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.571e+00 2.173e-01 11.831 <2e-16 ***
salary -3.633e-06 4.086e-07 -8.893 <2e-16 ***
years_at_company -6.333e-01 1.476e-02 -42.919 <2e-16 ***
job_satisfaction 3.470e-02 3.186e-02 1.089 0.276
manager_rating 5.071e-03 3.810e-02 0.133 0.894
work_life_balance 2.587e-02 3.198e-02 0.809 0.419
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8574.5 on 6708 degrees of freedom
Residual deviance: 4781.6 on 6703 degrees of freedom
AIC: 4793.6
Number of Fisher Scoring iterations: 5
## install sjPlot package and use tab_model function to display the summary of the mode
tab_model(hr_attrition_glm_model,
title = "Logistic Regression Model for Employee Attrition",
show.ci = TRUE, # Show confidence intervals
show.se = TRUE, # Show standard errors
show.stat = TRUE) # Show statistics| attrition | |||||
| Predictors | Odds Ratios | std. Error | CI | Statistic | p |
| (Intercept) | 13.08 | 2.84 | 0.00 – Inf | 11.83 | <0.001 |
| salary | 1.00 | 0.00 | 0.00 – Inf | -8.89 | <0.001 |
| years at company | 0.53 | 0.01 | 0.00 – Inf | -42.92 | <0.001 |
| job satisfaction | 1.04 | 0.03 | 0.00 – Inf | 1.09 | 0.276 |
| manager rating | 1.01 | 0.04 | 0.00 – Inf | 0.13 | 0.894 |
| work life balance | 1.03 | 0.03 | 0.00 – Inf | 0.81 | 0.419 |
| Observations | 6709 | ||||
| R2 Tjur | 0.502 | ||||
## use plot_model function to visualize the model coefficients
plot_model(hr_attrition_glm_model,
title = "Model Coefficients for Employee Attrition",
show.values = TRUE, # Show coefficient values
value.offset = 0.3, # Adjust the position of the values
vline.color = "#c9184a") # Color of the vertical line at zeroThe logistic regression model created to predict employee attrition revealed significant insights into the factors influencing turnover. Key predictors included salary and years at the company, both of which exhibited strong negative relationships with attrition. Specifically, higher salaries and longer tenure were associated with lower likelihoods of employees leaving the organization, as indicated by their statistically significant coefficients. In contrast, variables such as job satisfaction, manager rating, and work-life balance did not show significant effects on attrition, suggesting that these factors may not be as critical in this context.
5.4 Analysis of compensation and turnover
Compare the average monthly income of employees who left the company (
bi_attrition = 1) and those who stayed (bi_attrition = 0). Use thet.testfunction to conduct a t-test and determine if there is a significant difference in average monthly income between the two groups. Save the results in a variable calledattrition_ttest_results.Install the
reportpackage and use thereportfunction to generate a report of the t-test results.Install the
ggstatsplotpackage and use theggbetweenstatsfunction to visualize the distribution of monthly income for employees who left and those who stayed. Make sure to map thebi_attritionvariable to thexargument and thesalaryvariable to theyargument.Visualize the
salaryvariable for employees who left and those who stayed usinggeom_histogramwithgeom_freqpoly. Make sure to facet the plot by thebi_attritionvariable and applyalphaon the histogram plot.Provide recommendations on whether revising compensation policies could be an effective retention strategy.
## compare the average monthly income of employees who left and those who stayed
attrition_ttest_results <- t.test(salary ~ bi_attrition, data = hr_perf_dta)
## print the results of the t-test
print(attrition_ttest_results)
Welch Two Sample t-test
data: salary by bi_attrition
t = 18.869, df = 5524.2, p-value < 2.2e-16
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
38577.82 47523.18
sample estimates:
mean in group 0 mean in group 1
125007.26 81956.76
## install the report package and use the report function to generate a report of the t-test results
report(attrition_ttest_results)Effect sizes were labelled following Cohen's (1988) recommendations.
The Welch Two Sample t-test testing the difference of salary by bi_attrition
(mean in group 0 = 1.25e+05, mean in group 1 = 81956.76) suggests that the
effect is positive, statistically significant, and medium (difference =
43050.50, 95% CI [38577.82, 47523.18], t(5524.24) = 18.87, p < .001; Cohen's d
= 0.51, 95% CI [0.45, 0.56])
# install ggstatsplot package and use ggbetweenstats function to visualize the distribution of monthly income for employees who left and those who stayed
ggbetweenstats(
data = hr_perf_dta,
x = attrition,
y = salary,
title = "Distribution of Monthly Income by Employee Attrition",
x.label = "Employee Attrition",
y.label = "Monthly Income"
)# create histogram and frequency polygon of salary for employees who left and those who stayed
ggplot(hr_perf_dta, aes(x = salary, fill = factor(attrition))) +
geom_histogram(alpha = 0.5, position = "identity", bins = 30, color = "white") +
geom_freqpoly(aes(y = ..density..), color = "black", size = 1.2, bins = 30) + # thicker line
facet_wrap(~ bi_attrition) +
scale_fill_manual(values = c("#ff758f", "#c9184a"),
labels = c("Stayed", "Left")) +
labs(title = "Salary Distribution by Attrition Status",
x = "Monthly Income",
y = "Count",
fill = "Attrition Status") +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.text = element_text(size = 10),
legend.position = "top",
legend.title = element_text(face = "bold"),
panel.grid.minor = element_blank())Based on the results of the t-test and the visualizations, we can draw several conclusions regarding compensation policies as a retention strategy:
Significant Differences: If the t-test indicates a significant difference in average monthly income between employees who left and those who stayed, this suggests that compensation may play a crucial role in employee retention.
Salary Disparities: If employees who left the company had significantly lower salaries, it may indicate that the organization is not competitive in its compensation practices, leading to higher turnover.
Retention Strategies: Revising compensation policies to ensure competitive salaries could be an effective retention strategy. This may involve conducting market salary surveys, adjusting pay scales, or implementing performance-based bonuses.
Employee Feedback: Gathering feedback from employees regarding their compensation satisfaction can provide insights into potential areas for improvement.
Comprehensive Approach: While revising compensation is important, it should be part of a broader strategy that includes career development opportunities, work-life balance, and employee engagement initiatives to create a more holistic approach to retention.
5.5 Employee satisfaction and performance analysis
Analyze the average performance ratings (both
ManagerRatingandSelfRating) of employees who left vs. those who stayed. Use thegroup_byandcountfunctions to calculate the average performance ratings for each group.Visualize the distribution of
SelfRatingfor employees who left and those who stayed using a bar plot. Use theggplotfunction to create the plot and map theSelfRatingvariable to thexargument and thebi_attritionvariable to thefillargument.Similarly, visualize the distribution of
ManagerRatingfor employees who left and those who stayed using a bar plot. Make sure to map theManagerRatingvariable to thexargument and thebi_attritionvariable to thefillargument.Create a boxplot of
salarybyjob_satisfactionandbi_attritionto analyze the relationship between salary, job satisfaction, and attrition. Use thegeom_boxplotfunction to create the plot and map thesalaryvariable to thexargument, thejob_satisfactionvariable to theyargument, and thebi_attritionvariable to thefillargument. You need to transform thejob_satisfactionandbi_attritionvariables into factors before creating the plot or within theggplotfunction.Discuss the results of the analysis and provide recommendations for HR interventions based on the findings.
# Analyze the average performance ratings (both ManagerRating and SelfRating) of employees who left vs. those who stayed.
average_ratings <- hr_perf_dta |>
group_by(attrition) |>
summarise(
mean_manager= mean(manager_rating, na.rm = TRUE),
mean_self= mean(self_rating, na.rm = TRUE))# Visualize the distribution of SelfRating for employees who left and those who stayed using a bar plot.
ggplot(na.omit(hr_perf_dta), aes(x = factor(bi_attrition), fill = factor(self_rating))) +
geom_bar(position = "dodge", alpha = 0.8) +
geom_text(stat = "count", aes(label = ..count..),
position = position_dodge(0.9), vjust = 2, size = 4, color = "white") +
scale_fill_manual(values = c("#ff758f", "#ff4d6d", "#c9184a"), name = "Self Rating") +
scale_x_discrete(labels = c("Stayed", "Left")) +
labs(title = "Distribution of Self Ratings by Attrition Status",
x = "Attrition Status",
y = "Count") +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
legend.position = "top",
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "white"),
plot.background = element_rect(fill = "#fff0f3")
)# Visualize the distribution of ManagerRating for employees who left and those who stayed using a bar plot.
ggplot(na.omit(hr_perf_dta), aes(x = factor(bi_attrition), fill = factor(manager_rating))) +
geom_bar(position = "dodge", alpha = 0.8) +
geom_text(stat = "count", aes(label = ..count..),
position = position_dodge(0.9), vjust = 2, size = 4, color = "white") +
scale_fill_manual(values = c ("#ff8fa3", "#ff758f", "#ff4d6d", "#c9184a" ), name = "Self Rating") +
scale_x_discrete(labels = c("Stayed", "Left")) +
labs(title = "Distribution of Manager Ratings by Attrition Status",
x = "Attrition Status",
y = "Count") +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
legend.position = "top",
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "white"),
plot.background = element_rect(fill = "#fff0f3")
)# create a boxplot of salary by job_satisfaction and bi_attrition to analyze the relationship between salary, job satisfaction, and attrition.
# create a boxplot of salary by job_satisfaction and bi_attrition to analyze the relationship between salary, job satisfaction, and attrition.
ggplot(na.omit(hr_perf_dta), aes(x = factor(bi_attrition), y = salary, fill = factor(job_satisfaction))) +
geom_boxplot(alpha = 0.8) +
scale_fill_manual(values = c("#ffccd5", "#ff8fa3", "#ff758f", "#ff4d6d", "#c9184a"), name = "Job Satisfaction") +
labs(title = "Salary by Job Satisfaction and Attrition",
x = "Attrition Status",
y = "Salary") +
scale_x_discrete(labels = c("Stayed", "Left")) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
legend.position = "top",
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "white"),
plot.background = element_rect(fill = "#fff0f3")
)This plot reveals distinct trends between employees who stayed and those who left. Employees who stayed exhibit a wider salary distribution, particularly at higher job satisfaction levels (4 and 5), with median salaries increasing alongside satisfaction, suggesting that satisfied employees may receive better compensation. Conversely, the salary distributions for employees who left show less variation and generally lower median salaries across all satisfaction levels, indicating a weaker relationship between salary and job satisfaction for this group. This suggests that salary alone may not have been a significant factor in their decision to leave. Overall, the findings imply that companies may reward satisfied employees with higher pay, while other factors such as work-life balance and company culture could be more influential in driving attrition among those who left.
5.6 Work-life balance and retention strategies
At this point, you are already well aware of the dataset and the possible factors that contribute to employee attrition. Using your R skills, accomplish the following tasks:
Analyze the distribution of WorkLifeBalance ratings for employees who left versus those who stayed.
Use visualizations to show the differences.
Assess whether employees with poor work-life balance are more likely to leave.
You have the freedom how you will accomplish this task. Be creative and provide insights that will help HR develop effective retention strategies.
# create a boxplot to analyze the relationship between salary, job satisfaction, and attrition.
ggplot(na.omit(hr_perf_dta), aes(x = factor(work_life_balance), fill = factor(bi_attrition))) +
geom_bar(position = "dodge") +
labs(
title = "Distribution of Work-Life Balance Ratings by Attrition Status",
x = "Work-Life Balance Rating (1 = Unacceptable, 5 = Above and Beyond)",
y = "Count",
fill = "Attrition Status\n(0 = Stayed, 1 = Left)"
) +
scale_fill_manual(values = c("#ffccd5", "#c9184a")) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.text = element_text(size = 10),
panel.background = element_rect(fill = "white"),
plot.background = element_rect(fill = "#fff0f3")
)The bar plot comparing Work-Life Balance ratings between employees who stayed (Attrition = 0) and those who left (Attrition = 1) reveals significant trends regarding employee retention. For the lowest ratings of 1, very few employees left, indicating that extreme dissatisfaction may not be prevalent among those who choose to stay. However, for a rating of 2, a considerable portion of employees who left is evident, suggesting that lower work-life balance ratings are associated with higher attrition rates. In the moderate ratings of 3 and 4, while the majority of employees remained, there is still a noticeable number of departures, indicating that even moderate dissatisfaction can impact retention. Conversely, for the highest rating of 5, the number of employees who stayed is significantly greater, with fewer departures, highlighting that high work-life balance ratings correlate with employee retention. Overall, these findings underscore the critical role of work-life balance in influencing employee decisions to stay with or leave the organization.
5.7 Recommendations for HR interventions
Based on the analysis conducted, provide recommendations for HR interventions that could help reduce employee attrition and improve overall employee satisfaction and performance. You may use the following question as guide for your recommendations and discussions.
What are the key factors contributing to employee attrition in the company?
Which factors are most strongly correlated with attrition?
What strategies could be implemented to improve employee retention and satisfaction?
How can HR leverage the insights from the analysis to develop effective retention strategies?
What are the potential benefits of implementing these strategies for the company?
Based on the analysis of employee attrition and the factors influencing it, several recommendations can be made for HR interventions aimed at reducing attrition and enhancing overall employee satisfaction and performance.
1. Key Factors Contributing to Employee Attrition
Work-Life Balance: The analysis indicates that employees with lower work-life balance ratings are more likely to leave. This suggests that dissatisfaction in this area is a significant contributor to attrition.
Job Satisfaction: Employees who report higher job satisfaction tend to stay longer, indicating that dissatisfaction in this area can lead to higher turnover.
Salary and Compensation: While salary alone may not be the sole factor, it is correlated with job satisfaction and can influence retention, especially among high-performing employees.
2. Factors Most Strongly Correlated with Attrition
Work-Life Balance Ratings: The analysis shows a clear correlation between low work-life balance ratings and higher attrition rates.
Job Satisfaction Levels: Higher job satisfaction ratings correlate with lower attrition, suggesting that improving satisfaction can help retain employees.
Salary Levels: Employees with higher salaries tend to have higher satisfaction and lower attrition rates, indicating that compensation is a relevant factor.
3. Strategies to Improve Employee Retention and Satisfaction
Enhance Work-Life Balance:
Implement flexible working hours and remote work options to accommodate employees’ personal needs.
Promote a culture that values work-life balance, encouraging employees to take breaks and utilize vacation time.
Increase Job Satisfaction:
Conduct regular employee surveys to gather feedback on job satisfaction and areas for improvement.
Provide opportunities for professional development and career advancement to enhance job fulfillment.
Review Compensation Packages:
Conduct market research to ensure that salaries and benefits are competitive within the industry.
Consider performance-based bonuses and recognition programs to reward high-performing employees.
Foster a Positive Work Environment:
Encourage open communication and feedback between employees and management to build trust and transparency.
Organize team-building activities and social events to strengthen relationships among employees.
4. Leveraging Insights for Effective Retention Strategies
HR can utilize the insights from the analysis to tailor interventions that specifically address the factors contributing to attrition. By focusing on work-life balance, job satisfaction, and competitive compensation, HR can create targeted programs that resonate with employees’ needs. Regularly monitoring these factors through surveys and feedback mechanisms will allow HR to adapt strategies as necessary and ensure they remain effective.
5. Potential Benefits of Implementing These Strategies
Reduced Attrition Rates: By addressing the key factors contributing to attrition, the company can expect a decrease in turnover, leading to a more stable workforce.
Increased Employee Satisfaction: Enhancing work-life balance and job satisfaction will likely lead to higher employee morale and engagement, fostering a positive workplace culture.
Improved Performance: Satisfied employees are generally more productive and motivated, which can lead to better overall performance and outcomes for the company.
Cost Savings: Reducing attrition can save the company significant costs associated with recruitment, onboarding, and training new employees.
Enhanced Employer Brand: A reputation for valuing employee well-being and satisfaction can attract top talent, making the company a desirable place to work.
By implementing these strategies, HR can create a more supportive and engaging work environment that not only retains employees but also enhances their overall performance and satisfaction.